Non-Contiguous Tree Parsing
نویسندگان
چکیده
Pairing structural descriptions in MT, syntax-semantics interfaces and so on becomes more difficult the more structurally different are the languages involved; there is, implicitly or explicitly, a process of ‘tree parsing’, where a structural description is split into component smaller trees for transfer rules to be applied. Recent work has looked at the construction of transfer rules, using both symbolic and statistical approaches, that require the pairing of groups of several contiguous nodes in structural descriptions. We look at the case where pairings of groups of non-contiguous nodes are necessary, and present an efficient dynamic programming algorithm based on TAG and drawing on compiler theory for a decomposition into appropriate groupings. We then examine the formal properties of this algorithm, and show that it is linear in the number of nodes in the tree and has the same complexity as existing algorithms requiring only groupings of contiguous nodes.
منابع مشابه
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation
The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of subtrees. This paper goes further to present a translation model based on non-contiguous tree sequence alignment, where a non-contiguous tree sequence is a sequence of sub-trees and gaps. Compared with the contiguous...
متن کاملIntroducing Non-Syntactic Phrases into a Syntax-Based Machine Translation System
The dominance of traditional phrase-based statistical machine translation (SMT) models (Koehn, Och, and Marcu, 2003) has recently been challenged by the development and improvement of a number of newer translation models that explicity take into account the syntax of the sentences being translated. One simple approach to incorporating syntax is to limit the phrases learned by a standard SMT tra...
متن کاملUsing LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units
The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, ...
متن کاملNon-Contiguous Pattern Avoidance in Binary Trees
In this paper we consider the enumeration of binary trees avoiding non-contiguous binary tree patterns. We begin by computing closed formulas for the number of trees avoiding a single binary tree pattern with 4 or fewer leaves and compare these results to analogous work for contiguous tree patterns. Next, we give an explicit generating function that counts binary trees avoiding a single non-con...
متن کاملNon-Projective Dependency Parsing using Spanning Tree Algorithms
We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for searching over all projective trees in O(n3) time. More surprisingly, the representation is extended naturally to non-projective parsing using Chu-Liu-Edmonds (Chu and Liu, 1965; Edmonds, 1967) MST algorit...
متن کامل